Symbols
- α (alpha)
- Bonferroni, 153–154
- definition of, 41
- level, 206
- relation to sample sizes, 365–366
- setting, 43
- * (asterisk), 155
- β (beta), 41
- λ (half‐life), 280–282
- κ (kappa), 189–190
- μg (micrograms), 280–282
- Π (pi), 27–28
- √ (radical sign), 21
- Σ (sigma), 27–28
- γ (skewness coefficient), 121
A
- absolute values, 23
- accuracy, 37, 38, 262–264
- active group, 187
- actuarial life tables, 307, 311–316, 320–321
- addition, 18–19
- additive, 296
- administrative measurements, 63
- adverse events, 70
- agriculture, 1
- Akaike’s Information Criterion (AIC), 259, 276–277, 342
- alcohol intake, 94–98
- alpha (α)
- Bonferroni, 153–154
- definition of, 41
- level, 206
- relation to sample sizes, 365–366
- setting, 43
- alternative hypothesis, 40, 43, 144, 150–151, 324
- Alzheimer’s disease, 91, 146
- amputation, 292–293
- analysis of variance (ANOVA)
- assessing, 152–157
- introduction to, 11, 47–49
- using, 143–145, 158
- analytic dataset, 76
- analytic research, 88–90
- analytic study designs, 91
- analytic suite, 57–58
- analyzing data, 7, 9–10, 74–76
- and rule, 31
- animal research, 1
- ANOVA (analysis of variance)
- assessing, 152–157
- introduction to, 11, 47–49
- using, 143–145, 158
- anticipated enrollment rate, 347
- antilogarithm, 22, 118–119
- anti‐synergy, 245–247
- area under the ROC curve (AUC), 265, 280
- arguments, 23
- arithmetic mean, 115–116
- arrays, 25–27
- asbestos, 296–298
- asymptomatic confidence limits, 134
- attrition, 366–367
- average values, 11, 39–40, 141–158
B
- background information, 69
- backward elimination approach, 295
- bad fit line, 215–216
- balanced groups, 154
- bar charts, 113, 126
- base‐2 logarithms, 22
- base‐10 logarithms, 22
- base‐e logarithms, 22
- baseline survival function, 342–344
- baseline values, 96
- beta (β), 41
- bimodal (two‐peaked) distribution, 114, 117
- binary logarithms, 22
- binary variables, 173–174, 177, 236–238
- binomial distribution, 36, 354–355
- biology, 1
- biopsy specimens, 188
- biostatisticians, 60, 141, 161, 168–169
- biostatistics, definition of, 1, 7
- bivariate analysis, 11, 128, 145, 160
- blinding, 66, 70, 171
- blocked randomization, 67
- blood pressure
- of study participants, 116–117
- variable name, 18
- body mass index (BMI), 177
- bolus, 280
- Bonferroni adjustment, 153–156
- box‐and‐whiskers charts, 127–128
- Bradford Hill’s criteria of causality, 297–298
C
- calculated values, 37
- calculations, 8
- cancer
- as a categorical variable, 236–238
- liver, 93–97
- lung, 296–298
- relation to weight, 11
- remission, 318, 325
- stages, 102
- survival data, 208, 301–306, 337, 343
- candidate covariates, 291
- cannabidiol (CBD), 159–166
- car accidents, 274–278
- case report forms (CRFs), 74
- case reports, 91
- case series, 91
- case studies, 91
- case‐control study, 90, 93–96
- CAT (computerized tomography) scans, 188
- categorical data, 112–114
- categorical variables, 236–238
- causal inference, 11–12, 87, 90–95, 247
- CBD (cannabidiol), 159–166
- censoring, 302–306, 329, 335
- census, 33–34
- center, 115
- centiles, 120
- central tendency, 115
- central‐limit theorem (CLT), 134
- charts and charting. See also graphing
- bar and pie, 113, 126
- box‐and‐whiskers, 127–128
- categorical data, 112–114
- correlation coefficients, 202–203
- hazard rates and survival probabilities, 312–315
- multiple regression, 243–254
- numerical data, 124–128
- Poisson regression, 273–276
- Receiver Operator Characteristics (ROC), 264–265
- residuals, 222–223
- scatter plots, 214, 219, 221, 238–240
- software for, 57–58
- s‐shaped data, 252–256
- student t test, 45–47
- CHD (coronary heart disease), 92–93
- chemotherapy, 337
- chi‐square distribution, 165–166, 358–359
- chi‐square test
- chronic obstructive pulmonary disease (COPD), 302
- CI (confidence interval)
- CIR (cumulative incidence rate), 178–179
- CL (confidence level), 39
- CL (confidence limits), 130, 134
- classification tables, 262–264
- ClinCalc, 172
- clinical center variable, 338
- clinical trials, 9–10, 61–74
- cloud‐based storage, 60
- CLT (central‐limit theorem), 134
- cluster sampling, 84
- Cochrane Collaboration, 297
- code, 241
- code‐based methods, 60
- coding categories, 105–107
- coefficient of determination, 227
- coefficient of variation (CV), 120
- Cohen’s Kappa, 189
- cohort study, 90, 93–96
- collecting data
- introduction to, 1–2, 9–10
- manually and digitally, 74, 101–110
- collinearity, 246–247, 266
- commercial software, 54–58
- common logarithms, 22
- comparing averages, 142
- complete separation, 267–268
- complicated formulas, 24
- Comprehensive R Archive Network (CRAN), 58
- computer science, 18
- computerized tomography (CAT) scans, 188
- concordance, 342
- confidence interval (CI)
- confidence level (CL), 39
- confidence limits (CL), 130, 134
- confidentiality, 71
- confounding
- adjusting for, 145, 294–296
- criteria for, 292
- definition of, 66, 94
- residual, 68
- constants, 17
- contingency tables, 173–178, 184, 187–188
- control group, 98, 154, 187
- convenience sampling, 84–85
- COPD (chronic obstructive pulmonary disease), 302
- coronary heart disease (CHD), 92–93
- correlation, 40, 201–202, 220
- correlation coefficient
- analyzing, 203–207
- description of, 202–203
- example of, 10, 40
- straight‐line regression, 227–228
- table, 242
- correlational studies, 91–93
- COVID‐19, 183–184
- Cox, David (biostatistician), 330
- Cox proportional hazards regression, 330
- Cox/Snell R‐square, 259
- CRAN (Comprehensive R Archive Network), 58
- CRFs (case report forms), 74
- crossover design, 65
- cross‐sectional studies, 87–96, 176
- cross‐tabulated data
- analyzing, 160–167, 171–172
- example of, 112–113, 166
- introduction to, 11
- relation to logistic regression, 250
- tables, 173–174
- cumulative incidence rate (CIR), 178–179
- cumulative survival probability, 311–314
- curved‐line relationships, 214
- CV (coefficient of variation), 120
D
- data. See also cross‐tabulated data
- analyzing and collecting, 1, 7, 9–10, 74–76, 101–110
- categorical, 112–114
- free‐text, 103
- interval and ordinal, 102
- ratio, 102, 107–108
- skewed and unskewed, 11, 114, 121, 353
- survival, 208, 301–306, 337, 343
- time, 108–110
- data close‐out, 76
- data dictionary, 110
- data safety monitoring board (DSMB), 73, 76
- data safety monitoring committee (DSMC), 73
- data snapshot, 76
- date data, 108–110
- date of last contact, 303
- DBP (diastolic blood pressure), 116–117
- deciliter (dL), 280
- decision theory, 10, 39–40
- degrees of freedom (df)
- calculating, 147–148, 152–153
- for chi‐square tests, 166–167, 358–359
- dementia, 91, 146, 187–188
- denominator, 41–42
- dependent variable, 208, 213, 235–245
- descriptive research, 88–90
- descriptive study designs, 91
- desired power, 347
- desired α level, 347
- determinants, 191
- deviation, 119, 258–259
- df (degrees of freedom)
- calculating, 147–148
- for chi‐square test, 166–167, 358–359
- numerator and denominator, 152
- diabetes, 135–136, 236–240. See also Type II diabetes
- diagnostic procedures, 183–188
- diastolic blood pressure (DBP), 116–117, 123–124
- dichotomous variables, 173–174, 249, 269
- difference, 147–148, 163
- difference table, 163
- disease, 191, 193–194
- dispersion, 115, 119
- distribution center, 115
- distributions
- bimodal (two‐peaked), 114–117
- binomial, 36, 354–355
- chi‐square, 165–166, 358–359
- exponential, 356
- Fisher F, 152, 359–360
- frequency, 47–48
- leptokurtic and platykurtic, 122
- normal, 13, 36, 114, 353
- probability, 35–37
- sampling, 38
- statistical, 13
- student t, 357–358
- weibull, 330, 356–357
- District of Columbia, 34–35
- division, 20
- dL (deciliter), 280
- dose‐response relationship, 298
- double‐blinding, 66, 97
- double‐precision numbers, 107
- drug description, 70
- drug development research, 280–282
- DSMB (data safety monitoring board), 73, 76
- DSMC (data safety monitoring committee), 73
- Dupont, William D. (biostatistician), 60
E
- ECG (electrocardiogram), 188
- ecologic fallacy, 93
- ecologic studies, 91–93
- effect modification, 296–297
- effect size
- compared to power and sample size, 45–47
- definitions of, 362–364
- example of, 39
- of importance, 158, 206, 361
- efficacy objectives, 62–63
- electrocardiogram (ECG), 188
- elements, 24, 27–28
- elimination constant rate, 282
- engineering, 18
- enzyme levels, 125–128
- Epi Info, 59
- epidemiological research, 1, 9–12
- epidemiology, 191, 291–298
- Epidemiology for Dummies (Mitra), 92
- equal variance, 143
- equations, 15, 24–25, 35–36
- error, 78
- estimate value, 242
- estimation theory, 10
- event status variable, 337
- evidence levels, 91
- Excel (Microsoft)
- for data collection, 103, 105, 107–110
- functions of, 57
- for log‐rank tests, 319–320
- for randomization, 67
- for straight‐line regression, 217
- for survival regression, 343
- exclusion, 74
- exclusion criteria, 64
- expected count, 164
- expected survival, 329, 331, 343–346
- experimental research, 61, 88–90, 97–98
- experiments, 1, 9–10, 91
- expert opinion, 91
- explicit constants, 17
- exploratory analysis, 294
- exploratory efficacy objective, 63
- exploratory objectives, 62–63
- exponential distribution, 356
- exponential increase, 277
- exponentiating, 21
- exposure, 83, 175, 178, 193, 292
- extrapolation, 108
F
- F ratio, 152
- F statistic, 228. 242
- F value (value of F statistic), 155
- factorials, 22
- failure times, 356–357
- fasting glucose values, 25–26, 153
- fat intake, 92–93
- FDA (Food and Drug Administration), 71
- first quartile, 222
- Fisher, Ronald Aylmer (biostatistician), 196
- Fisher Exact test, 11, 13, 169–172
- Fisher F distribution, 152, 359–360
- Fisher z transformation, 204–205
- fleas, 7
- floating point numbers, 107
- Food and Drug Administration (FDA), 71
- formulas
- building blocks of, 17
- creating, 11–12
- definition of, 15
- introduction to, 1–2, 7–8
- types of, 16, 24
- forward stepwise approach, 294–295
- fourfold tables, 11, 173–178, 184, 187–188
- fractional numbers, 107
- free software, 58–60
- free‐text data, 103
- frequency bar charts, 113
- frequency distribution, 47–48
- functions, 23
G
- gamma‐ray radiation, 251–256, 260–263, 267, 337–338
- Gaussian distribution, 353
- generalized linear model (GLM), 272–278
- genetics, 1
- geometric mean (GM), 118–119
- GLM (generalized linear model), 272–278
- glucose values, 25–26, 153
- gold standard test, 183–184
- good fit line, 215–216
- goodness of fit, 227–228, 258–259
- G*Power
- graphing. See also charts and charting
- categorical data, 112–114
- correlation coefficients, 202–203
- hazard rates and survival probabilities, 312–315
- multiple regression, 243–245
- numerical data, 124–128
- Poisson regression, 273–276
- Receiver Operator Characteristics (ROC), 264–265
- residuals, 222–223
- software for, 57–58
- s‐shaped data, 252–256
- student t test, 45–47
- GraphPad, 67
- Greek letters, 17
- GUI (guided user interface), 56, 58
H
- h value, 344–346
- half‐life (λ), 280–282
- hazard rate
- definition of, 305
- from life tables, 311–315
- relation to survival rate, 333
- hazard ratios (HR), 334–335, 340, 364
- health insurance, 112–114
- healthcare, 9–10
- highway accidents, 274–278
- Hill, Bradford (epidemiologist)
- Bradford Hills’ criteria of causality, 297–298
- histogram, 34–35, 124–125
- historical control, 142
- H‐L test, 259
- homogeneity of variances, 155
- hormone concentration, 287–290
- Hosmer‐Lemeshow Goodness of Fit test, 259
- HR (hazard ratios), 334–335, 340
- HTN (hypertension), 90, 94–98, 177–178
- human health research, 88
- human subjects protection certification, 73
- hyperplane, 234
- hypertension (HTN), 90, 94–98, 177–178
- hypothesis, 40–47, 63, 94, 247. See also null hypothesis
- hypothesis‐driven analysis, 294
- hypothesized cause, 83, 175, 178, 193, 292
I
- ICF (Informed Consent Form), 72–73
- ICH (International Conference on Harmonization), 72
- icons explained, 3
- identification (ID) numbers, 104
- identity line, 245
- imputation, 75
- incidence, 191–198
- incidence rate, 192–198
- inclusion criteria, 64
- independent t test, 148
- independent variable, 208–210, 213, 215, 291–294
- indicator variables, 237–238
- indices, 174
- individual‐level data, 160
- inferential statistics, 34, 77
- inferring, 10
- infinity, 33
- influenza, 192
- Informed Consent Form (ICF), 72–73
- inner mean, 118
- integers, 107
- interaction, 296–297
- interaction terms, 237, 329
- intercept, 215, 221, 224–225, 272, 329
- intercept row, 224
- interim analysis, 76
- International Conference on Harmonization (ICH), 72
- International Review Board (IRB), 72
- Internet sources. See also G*Power; Microsoft Excel
- ClinCalc, 14
- Comprehensive R Archive Network (CRAN), 58
- Epi Info, 59
- GraphPad, 67
- International Business Machines, 57
- National Health and Nutrition Examination Survey (NHANES), 82, 86, 93, 148–157
- National Institutes of Health (NIH), 72–73
- OpenStat and LazStats, 59
- Power and Sample Size Calculation (PS), 60, 171–172, 324–325
- SAS OnDemand for Academics (ODA), 55–56
- Statista, 34
- StatPages, 158, 190, 282
- interpolation, 208
- inter‐quartile range (IQR), 120
- inter‐rater reliability, 188–190
- interval data, 102
- intervention‐related measurements, 63
- interventions, 61–62
- intra‐rater reliability, 188–190
- IQR (inter‐quartile range), 120
- IRB (International Review Board), 72
- iterative models, 246–247
K
- Kaplan‐Meier method, 313–316, 328, 330
- Kaplan‐Meier (K‐M) survival estimate, 313, 317
- kappa (κ), 189–190
- kilograms (kg), 218–220, 225–226, 239
- Kruskal, William (statistician), 141
- Kruskal‐Wallis test, 47–49, 144, 157
- kurtosis, 121–122
L
- Last Observation Carried Forward (LOCF), 75
- last‐seen date, 303
- LazStats, 59
- least‐squares line, 234
- left skewed data, 121
- leptokurtic distribution, 122
- lethal dose, 262
- levels of measurement, 102–103
- LFU (lost to follow‐up), 303–304, 308–309
- life sciences, 1
- life‐table method, 307, 311–316, 320–321
- likert scale, 102, 106
- line of best fit, 215–216
- linear combination, 329
- linear function, 210–211
- linear model, 272–273
- linear regression, 210
- link function, 273–274
- liver cancer, 93–97
- locally weighted scatterplot smoothing (LOWESS) curve‐fitting, 12, 286–290
- LOCF (Last Observation Carried Forward), 75
- logarithms, 21–22, 118–119
- logistic regression
- basics of, 251–254
- definition of, 12, 210
- disadvantages of, 266–268
- evaluating, 257–265
- sample size for, 268–269
- using, 249–250, 255–257
- log‐normal distribution, 36, 125, 353–354
- log‐rank test, 317–325, 328–330
- longitudinal research, 90
- lost to follow‐up (LFU), 303–304, 308–309
- LOWESS (locally weighted scatterplot smoothing) curve‐fitting, 12, 286–290
- lung cancer, 296–298
M
- Mann, Henry (professor), 141
- Mann‐Whitney U test, 47–49, 143, 362
- Mantel‐Cox test. See log‐rank test
- Mantel‐Haenszel chi‐square test, 168
- margin of error (ME), 134
- marginal totals, 160
- masking, 66, 70, 171
- mathematical expressions, 15
- mathematical operations, 18–25
- matrix, 26
- matrix algebra, 26
- maximum value, 222
- ME (margin of error), 134
- mean
- arithmetic, 115–116
- compared to other values, 142–157, 362
- confidence limits, 134–135
- mean square (mean Sq), 155
- measurements, 63–64, 102–103
- mechanical function, 279
- median, 116–117, 123, 222
- meta‐analyses, 97–98
- metadata, 110
- mice, 1, 318
- micrograms (μg), 280–282
- Microsoft Excel
- for data collection, 103, 105, 107–110
- functions of, 57
- for log‐rank tests, 319–320
- for randomization, 67
- for straight‐line regression, 217
- for survival regressions, 343
- millimeters of mercury (mmHg), 218–226, 229, 239
- minimum value, 222
- missing data, 74–75
- Mitra, Amal K. (author)
- Epidemiology for Dummies, 92
- mmHg (millimeters of mercury), 218–226, 229, 239
- mode, 117
- model building, 246
- model fit statistics, 242
- models
- generalized linear (GLM), 272–278
- linear, 272–273
- null, 228, 242, 259
- parsimonious, 293
- predictive, 228–229
- regression, 68, 208–209
- molecular biology, 7
- multicollinearity, 246–247
- multi‐dimensional arrays, 26–27
- multilevel variable, 236
- multiple regression
- basics of, 234–235
- introduction to, 26
- sample size for, 247–248
- special considerations, 245–247
- using, 236–245
- multiple R‐squared, 242
- multiplication, 18–20
- multiplicative, 296
- multiplicity, 75–76
- multi‐site study, 104
- multi‐stage sampling, 85–86
- multivariable regression, 291
- multivariate analysis, 128, 145, 291
N
- Nagelkerke R‐square, 259
- National Health and Nutrition Examination Survey (NHANES), 82, 86, 93, 148–157
- National Institutes of Health (NIH), 72–73
- natural logarithms, 22
- negative predicted value (NPV), 187
- negatively skewed data, 121
- NHANES (National Health and Nutrition Examination Survey), 82, 86, 93, 148–157
- NIH (National Institutes of Health), 72–73
- nominal variables, 102
- non‐code‐based methods, 60
- nonlinear function, 211
- nonlinear least‐squares regression, 12
- nonlinear regression, 279–286
- nonlinear trends, 277
- nonparametric regression, 286–290
- nonparametric tests, 48–49, 157
- non‐proportional hazards, 323
- non‐sampling error, 78
- non‐steroidal anti‐inflammatory drugs (NSAIDS), 159–166
- normal distribution, 13, 36, 114, 353
- normal Q‐Q graph, 222–223
- normal‐based confidence intervals, 134
- normal‐based confidence limits, 134
- normality assumption, 143
- not rule, 31
- notches, 128
- NPV (negative predicted value), 187
- NSAIDs (non‐steroidal anti‐inflammatory drugs), 159–166
- nuisance variables, 145
- null hypothesis
- null model, 228, 242, 259
- numerator, 41–42
- numerical data, 107–109, 114–123
O
- obesity, 177–178, 182–183
- observational research, 88–90
- observed count, 164
- observed versus predicted graph, 245
- ODA (SAS OnDemand for Academics), 55–56
- odds, 32–33, 181
- odds ratio (OR), 94–96, 181–183, 266–267
- Office for Human Research Protections (OHRP), 72
- one‐dimensional arrays, 25
- one‐group t test, 148
- one‐sided confidence interval, 133
- one‐way ANOVA, 144
- open‐source software, 58–59
- OpenStat, 59
- OR (odds ratio), 94–96, 181–183, 266–267
- or rule, 31
- order of operation, 24
- ordinal data, 102
- ordinary multiple linear regression model. See multiple regression
- ordinary regression, 210
- outcome, 208
- outcome‐related measurements, 64
- outliers, 229
- overall accuracy, 185
P
- p value
- paired t test, 148
- paired values, 363
- parabolic relationship, 214, 252–253
- parallel design, 65
- parameters, 33–34, 77, 208, 279
- parametric tests, 48–49
- parsimonious models, 293
- parsimony, 293
- participant identification (ID), 240
- participant study identifier, 104
- participants. See also sample size; samples
- enrolling, 68
- protection for, 71–73
- selecting, 64–65, 236–237
- PatSat, 106
- PCR (polymerase chain reaction), 184
- Pearson, Karl (biostatistician), 161
- Pearson Correlation test, 47–49, 227
- Pearson kurtosis index, 122
- percentile, 120
- perfect predictor problem, 267–268
- perfect separation, 267–268
- periodicity, 83
- PH (proportional hazards regression), 330–331, 333–334
- pharmacokinetic (PK) properties, 280–282
- pi (Π), 27–28
- pie charts, 113
- pilot study, 230
- placebo, 66–67, 171, 187–188
- placebo effect, 66, 187–188
- plain text format, 16, 24
- platykurtic distribution, 122
- Plummer, Walton D. (biostatistician), 60
- pointy‐topped distribution, 114
- poisson distribution, 36, 355
- poisson regression
- polymerase chain reaction (PCR), 184
- populations, 33–37, 175
- positive predictive value (PPV), 187
- positively skewed data, 121
- post‐hoc tests, 143, 152–157
- potential confounding variables, 64
- Power and Sample Size Calculation (PS)
- for chi‐square and Fisher exact tests, 171–172
- definition of, 60
- for survival comparisons, 324–325
- power calculations, 47, 171–172, 361, 365
- powers, 20–21, 41, 44, 45–47, 206
- PPV (positive predictive value), 187
- precision, 37, 38, 147
- predicted values, 242
- predictive model, 228–229
- predictive value negative, 187
- predictive value positive, 187
- predictors
- introduction to, 208, 233
- in iterative models, 246–247
- in logistic models, 255–256
- in regression models, 273–274, 279
- relation to the outcome, 242, 245–246, 250
- types of, 209, 235–236
- pregnancy, 171, 185–187
- prevalence, 179, 186, 191–194
- prevalence ratio, 179
- primary diagnosis (PrimaryDx), 236–238
- primary efficacy objective, 62
- primary objectives, 62
- primary sampling units (PSU), 86
- privacy, 71
- probability, 30–33
- probability bell curve, 353
- probability distributions, 35–37
- probability of independence, 166
- procedural descriptions, 70
- product, of an array, 27
- prognosis curves, 329, 331, 343–346
- proportional hazards (PH) regression, 330–331, 333–334
- proportions, 11, 135–136, 363
- protective factor, 178
- protractor, 113
- PS (Power and Sample Size Calculation)
- for chi‐square and Fisher exact tests, 171–172
- definition of, 60
- for survival comparisons, 324–325
- pseudo‐r‐squared values, 259
- PSU (primary sampling units), 86
- Python, 58
R
- R (software)
- description of, 58
- nonlinear regression, 282–286
- odds ratio calculation, 183
- risk ratio calculation, 180–181
- straight‐line regression, 221
- r value, 203–206
- radiation exposure, 251–256, 260–263, 267, 337–338
- radical sign (√), 21
- random number generator (RNG), 80–81
- random shuffling, 67
- random variability, 158
- randomization, 97, 171
- randomized controlled trials (RCTs), 65–67, 98
- randomness, 33
- range, 120
- ranks, 49
- rate ratio (RR), 195–196
- ratio data, 102, 107–108
- rationale, 69
- RCTs (randomized controlled trials), 65–67, 98
- Receiver Operator Characteristics (ROC), 257, 264–265
- reference level, 237
- regression
- logistic
- basics of, 251–254
- definition of, 12, 210
- disadvantages of, 266–268
- evaluating, 257–265
- sample size for, 268–269
- using, 249–250, 255–257
- multiple
- basics of, 234–235
- introduction to, 26
- sample size for, 247–248
- special considerations, 245–247
- using, 236–245
- multivariable, 291
- ordinary, 210
- straight‐line
- basics of, 215–216
- disadvantages of, 229–231
- evaluating, 220–224
- using, 216–220
- when to use, 213–215
- survival
- concepts of, 329–335
- definition of, 210
- evaluating, 337–343
- sample size for, 346–347
- using, 335–336
- when to use, 328–329, 343
- univariate, 209
- regression analysis, 12, 207–208
- regression models, 68, 208–209
- relative frequency, 30
- relative risk, 95, 178–181
- REM (Roentgen Equivalent Man), 251–254, 260–262, 267
- research
- analytic, descriptive and observational, 88–90
- animal, 1
- epidemiological, 1, 9–12
- experimental, 61, 88–90, 97–98
- human health, 88
- longitudinal, 90
- research studies, 1
- residual information, 242
- residual standard error, 222, 242
- residuals, 222–224, 242–245
- residuals versus fitted graph, 222–223
- retinopathy, 292–293
- right skewed data, 121
- risk ratio, 96, 178–181
- RMS (root‐mean‐square), 222
- RNG (random number generator), 80–81
- ROC (Receiver Operator Characteristics), 257, 264–265
- Roentgen Equivalent Man (REM), 251–254, 260–262, 267
- Roman letters, 17
- root‐mean‐square (RMS), 222
- roots, 21
- Rothman’s causal pie, 297–298
- round‐off error, 352
- RR (rate ratio), 195–196
- RStudio, 58
- Rumsey, Deborah J. (author)
- Statistics For Dummies, 2, 29
- Statistics II for Dummies, 29
S
- safety considerations, 70
- safety objectives, 62–63
- safety study, 62
- sample size
- for chi‐square and Fisher exact tests, 171–172
- for cohort studies, 95–96
- compared to power and effect size, 44–47
- for comparing averages, 158
- for correlation tests, 206–207
- estimating, 198, 361–367
- introduction to, 14
- for logistic regression, 268–269
- for multiple regression, 247–248
- relation to confidence intervals, 130–131
- for straight‐line regression, 230–231
- for survival comparisons, 324–325
- for survival regression, 346–347
- sample statistic, 175
- samples. See also sample size
- framing, 78–79
- introduction to, 9–10, 14
- selecting, 33–37, 68, 78–80
- types of, 80–86
- sampling clusters, 83–84
- sampling distribution, 38
- sampling error, 34, 78
- sampling frame, 78
- sampling strategies, 175–176
- SAP (Statistical Analysis Plan), 70
- SAS (Statistical Analysis System), 54–56, 58, 110
- SAS OnDemand for Academics (ODA), 55–56
- SBP (systolic blood pressure)
- comparing, 39
- effect of drugs on, 62, 123–125
- relation between weight and, 218–229
- variable name of, 18
- scatter plots
- Scheffe’s test, 154–156
- scientific notation, 28
- screening tests, 184–187
- SD (standard deviation), 119, 123, 143, 222
- SE (standard error)
- calculating, 147–148, 179–180
- of coefficients, 225–226, 340
- compared to confidence intervals, 130–131
- description of, 38, 129, 163, 242
- in a fraction, 41
- secondary efficacy objective, 62
- secondary objectives, 62
- sensitivity, 185–186, 262–264
- sigma (Σ), 27–28
- significance, 40, 42–43, 174
- significance tests. See statistical tests; specific test names
- significant association, 11
- significant correlation, 363–364
- simple formulas, 24
- simple random samples (SRS), 80–81
- simple randomization, 66
- simple regression, 209
- simulation, 79
- single‐blinding, 66
- single‐precision numbers, 107
- single‐site study, 104
- skewed data, 11, 114, 121, 353
- skewness, 121
- skewness coefficient (γ), 121
- slope row, 224
- slopes, 215–217, 221, 224–231
- smoking, 296–298, 334–335
- smoothing fraction, 289–290
- Social Science Statistics, 170
- software
- for data collection, 105–110, 241
- evolution of, 54
- introduction to, 8
- for logistic regression, 256–257
- for power calculations, 47
- for straight‐line regression, 217
- types of, 54–60
- variables in, 18
- Spearman Rank Correlation test, 47–49
- specificity, 185–186, 262–264
- spot‐checking, 109
- SPSS (Statistical Package for the Social Sciences), 54–58
- SQL (Structured Query Language), 110
- square root, 332
- square root law, 131
- squaring, 332
- SRS (simple random samples), 80–81
- s‐shaped relationship, 214, 252–256
- SSQ (sum of squares), 155, 215–216, 234
- standard deviation (SD), 119, 123, 143, 222
- standard error (SE)
- calculating, 147–148, 179–180
- of coefficients, 225–226, 340
- compared to confidence intervals, 130–131
- description of, 38, 129, 163, 242
- in a fraction, 41
- statistic, definition of, 40
- Statistical Analysis Plan (SAP), 70
- Statistical Analysis System (SAS), 54–56, 58, 110
- statistical distributions, 13
- statistical estimation theory, 10, 37–39
- statistical inference, 10, 37
- Statistical Package for the Social Science (SPSS), 54–58
- statistical tests, 8, 40–41, 44, 47–49. See also tests
- statistically rare, 93
- Statistics For Dummies (Rumsey), 2, 29
- Statistics II For Dummies (Rumsey), 29
- StatPages, 158, 190, 282
- stepped line charts, 312–313
- stepwise selection, 195
- storage modes, 107
- straight‐line regression
- basics of, 215–216
- disadvantages of, 229–231
- evaluating, 220–224
- using, 216–220
- when to use, 213–215
- stratified samples, 81–82
- strong linear relationship, 214
- Structured Query Language (SQL), 110
- student t distribution, 357–358
- student t test, 41–42, 47–49, 142–152, 362–363
- student t value, 226
- study design, 2, 7, 14, 88–90
- study protocol, 68–71
- study rationale, 69
- study title, 69
- subtraction, 18–19
- sum, 27
- sum of squares (SSQ), 155, 215–216, 234
- summarizing data
- categorical, 112–114
- numerical, 114–123
- survival, 302–316
- summary statistics, 111
- surveillance, 89–90
- survival analysis, 13, 68, 301
- survival curve shapes, 329–330
- survival data, 208, 301–306, 337, 343
- survival rate, 305, 364
- survival regression
- concepts of, 329–335
- definition of, 210
- evaluating, 337–343
- sample size for, 346–347
- using, 335–336
- when to use, 328–329, 343
- survival time, 301–306, 317–325
- symbolic constants, 17
- symmetry, 115
- synergy, 245–247
- systematic error, 37
- systematic sampling, 82–83
- systemic reviews, 97–98
- systolic blood pressure (SBP)
- comparing, 39
- effect of drugs on, 62, 123–125
- relation between weight and, 218–229
- variable name of, 18
T
- t tests, 41–42, 47–49, 142–152, 362–363
- t value, 242
- Tableau, 57, 60
- terminal elimination rate constant, 12
- test statistic, 37, 40, 41–42, 147
- tests
- chi‐square, 11, 13, 161–169, 171–172, 174
- Fisher exact, 11, 13, 169–172
- H‐L, 259
- log‐rank, 317–325, 328–330
- Mann‐Whitney U, 47–49, 143, 362
- nonparametric, 48–49, 157
- post‐hoc, 143, 152–157
- Scheffe’s, 154–156
- Spearman Rank Correlation, 47–49
- student t, 41–42, 47–49, 142–152, 362–363
- Tukey‐Kramer, 154–156
- unequal‐variance t, 143
- Wilcoxon Signed‐Ranks (WSR), 47–49, 142, 146
- Wilcoxon Sum‐of‐Ranks, 47–49, 143, 157, 362
- theoretical function, 279
- third quartile, 222
- three‐way ANOVA, 144–145
- tied values, 49
- time data, 108–110
- time‐to‐event variable, 337
- treatment bias, 66
- treatment periods, 65
- treatments, 187–188
- trees, 1
- trend line, 276
- trimmed mean. See inner mean
- true value, 37
- Tukey‐Kramer test, 154–156
- Tukey’s HSD (“honestly” significant difference test), 154–156
- two‐dimensional arrays, 26
- two‐peaked (bimodal) distribution, 114, 117
- type I error, 41, 42–44, 75–76, 152
- Type II diabetes, 10–12, 192–197, 250, 292
- type II error, 41–44
- typeset format, 16, 24–25
- typographic effects, 16
U
- ultrasound, 188
- unbalanced groups, 154
- under‐coverage, 78
- unequal‐variance t test, 143
- uniform distribution, 352
- United States
- airports, 34–35
- census, 84
- International Review Board, 72
- surveillance study, 86
- univariate analysis, 128
- univariate regression, 209
- Universität Düsseldorf, 59
- unskewed data, 121
V
- value of F statistic (F value), 155
- values. See also p value
- absolute, 23
- average, 11, 141–158
- calculated, 37
- estimate, predicted and t values, 242
- F value, 155
- h value, 345–346
- paired, 363
- positive predictive, 187
- pseudo‐r‐squared, 259
- r value, 203–206
- tied, 49
- Vanderbilt University, 60
- Vanderbilt University Medical Center, 324–325
- variable names, 110
- variable width, 128
- variables
- binary, 173–174, 177, 236–238
- categorical, 236–238
- clinical center, 338
- dependent, 208, 213, 235–245
- dichotomous, 173–174, 249, 269
- event status, 337
- independent, 208–210, 213, 215, 291–294
- indicator, 237–238
- multilevel, 236
- nominal, 102
- nuisance, 145
- potential confounding, 64
- variance, 119, 143, 155
- variance table, 155
- viruses, 1
- Viya, 56, 60
- volume of distribution (Vd), 280–282
W
- Wallis, Wilson Allen (statistician), 141
- washout intervals, 65
- waves, 96, 176
- weak linear relationship, 214
- websites. See also G*Power; Microsoft Excel
- ClinCalc, 172
- Cochrane, 297
- Comprehensive R Archive Network (CRAN), 58
- Epi Info, 59
- Graphpad, 67
- International Business Machines, 57
- National Health and Nutrition Examination Survey (NHANES), 82, 86, 93, 148–157
- National Institutes of Health (NIH), 72–73
- OpenStat and LazStats, 59
- Power and Sample Size Calculation (PS), 60, 171–172, 324–325
- SAS OnDemand for Academics (ODA), 55–56, 57
- Social Science Statistics, 170
- Statista, 34
- StatPages, 158, 190, 282
- weibull distribution, 330, 356–357
- weight, 218–220, 225–226, 239–240, 246
- Welch, Bernard Lewis (statistician), 141
- Welch test, 143, 150–151
- Whitney, Donald Ransom (statistician), 141
- whole numbers, 107
- Wilcoxon, Frank (statistician), 141
- Wilcoxon Signed‐Ranks (WSR) test, 47–49, 142, 146
- Wilcoxon Sum‐of‐Ranks test, 47–49, 143, 157, 362
- withdrawal criteria, 64
- World War II, 71, 264
Y
- Y variable, 213–216, 224–228
- Yates, Frank (statistician), 168–169
- Yates continuity correction, 168–169